Encoding and Ranking Similar Chinese Characters

نویسندگان

  • Ming Liu
  • Vasile Rus
  • Qiang Liao
  • Li Liu
چکیده

Automatically detecting similar Chinese characters is useful in many areas, such as building intelligent authoring tools (e. g. automatic multiple choice question generation) in the area of computer assisted language learning. Previous work on the computation of Chinese character similarity focused on detecting character glyph similarity while ignored the importance of other character features, such as pronunciation and meaning. In this article, we present a way to encoding 4,500 simplified Chinese characters in terms of character glyph, pronunciation and meaning, annotating similar Chinese characters and automatically ranking similar characters based on the approach of learning to rank. The experiment results indicated that this approach could be useful for ranking and recognizing similar Chinese characters in terms of glyph, pinyin and semantic meaning. Moreover, it has been found that the learning to rank Listwise (ListNet) method was more effective than Pointwise (MART) and Pairwise (RankNet).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributional Similarity for Chinese: Exploiting Characters and Radicals

Distributional Similarity has attracted considerable attention in the field of natural language processing as an automatic means of countering the ubiquitous problem of sparse data. As a logographic language, Chinese words consist of characters and each of them is composed of one or more radicals. The meanings of characters are usually highly related to the words which contain them. Likewise, r...

متن کامل

Keyboard for inputting Chinese language

1.1 Technique of inputting Chinese character As the structure of Chinese characters are very different from the relatively simple alphabetic system of western languages, it is very difficult to input Chinese characters into computer quickly and conveniently. There are a few existing systems which include those based on the "PinYin" (phonetic) system, a combination of the PinYin system and chara...

متن کامل

A Hybrid Chinese Information Retrieval Model

A distinctive feature of Chinese test is that a Chinese document is a sequence of Chinese with no space or boundary between Chinese words. This feature makes Chinese information retrieval more difficult since a retrieved document which contains the query term as a sequence of Chinese characters may not be really relevant to the query since the query term (as a sequence Chinese characters) may n...

متن کامل

On the Ranking Property and Underlying Dynamics of Complex Systems

Ranking procedures are widely used to describe the phenomena in many different fields of social and natural sciences, e.g., sociology, economics, linguistics, demography, physics, biology, etc. In this dissertation, we dedicated to study the ranking properties and underlying dynamics embedded in complex systems. In particular, we focused on the scores/prizes ranking in sports systems and the wo...

متن کامل

Machine Recognition of Hand-Printed Chinese Characters

The recognition of Chinese characters has been an area of great interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: Chinese characters are distinct and ideographic, the character size is very large and many structurally similar characters exist in the character s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2017